Search CORE

10 research outputs found

A policy iteration algorithm for zero-sum stochastic games with mean payoff

Author: Jean Cochet-Terrasson
Stéphane Gaubert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Policy iteration algorithm for zero-sum stochastic games with mean payoff

Author: Cochet-Terrasson Jean
Gaubert Stéphane
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

We give a policy iteration algorithm to solve zero-sum stochastic games with finite state and action spaces and perfect information, when the value is defined in terms of the mean payoff per turn. This algorithm does not require any irreducibility assumption on the Markov chains determined by the strategies of the players. It is based on a discrete nonlinear analogue of the notion of reduction of a super-harmonic function

CiteSeerX

Comptes Rendus Mathématique

INRIA a CCSD electronic archive server

Numérisation de Documents Anciens Mathématiques

Algorithmes d'itération sur les politiques pour les applications monotones contractantes

Author: COCHET-TERRASSON Jean
QUADRAT Jean-Pierre
Publication venue
Publication date: 01/01/2001
Field of study

PARIS-MINES ParisTech (751062310) / SudocSudocFranceF

OpenGrey Repository

A policy iteration algorithm for zero-sum stochastic games with mean payoff

Author: Akian
Cochet-Terrasson
Condon
Denardo
Gaubert
Gaubert
Hoffman
Howard
Jean Cochet-Terrasson
Kohlberg
Lazarus
Peres
Stéphane Gaubert
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Solving multichain stochastic games with mean payoff by policy iteration

Author: Akian Marianne
Cochet-Terrasson Jean
Detournay Sylvie
Gaubert Stéphane
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2013
Field of study

International audienceZero-sum stochastic games with finite state and action spaces, perfect information, and mean payoff criteria arise in particular from the monotone discretization of mean-payoff pursuit-evasion deterministic differential games. In that case no irreducibility assumption on the Markov chains associated to strategies are satisfied (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). Cochet-Terrasson and Gaubert proposed in (C. R. Math. Acad. Sci. Paris, 2006) a policy iteration algorithm relying on a notion of nonlinear spectral projection (Akian and Gaubert, Nonlinear Analysis TMA, 2003), which allows one to avoid cycling in degenerate iterations. We give here a complete presentation of the algorithm, with details of implementation in particular of the nonlinear projection. This has led to the software PIGAMES and allowed us to present numerical results on pursuit-evasion games

INRIA a CCSD electronic archive server

HAL-Polytechnique

Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

Author: Akian Marianne
Cochet-Terrasson Jean
Detournay Sylvie
Gaubert Stéphane
Publication venue: HAL CCSD
Publication date: 02/08/2012
Field of study

Preprint arXiv:1208.0446, 34pagesWe consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). We develop here a policy iteration algorithm for zero-sum stochastic games with mean payoff, following an idea of two of the authors (Cochet-Terrasson and Gaubert, C. R. Math. Acad. Sci. Paris, 2006). The algorithm relies on a notion of nonlinear spectral projection (Akian and Gaubert, Nonlinear Analysis TMA, 2003), which is analogous to the notion of reduction of super-harmonic functions in linear potential theory. To avoid cycling, at each degenerate iteration (in which the mean payoff vector is not improved), the new relative value is obtained by reducing the earlier one. We show that the sequence of values and relative values satisfies a lexicographical monotonicity property, which implies that the algorithm does terminate. We illustrate the algorithm by a mean-payoff version of Richman games (stochastic tug-of-war or discrete infinity Laplacian type equation), in which degenerate iterations are frequent. We report numerical experiments on large scale instances, arising from the latter games, as well as from monotone discretizations of a mean-payoff pursuit-evasion deterministic differential game

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

Numerical computation of spectral elements in max-plus algebra

Author: Guy Cohen
Jean Cochet-terrasson
JEAN-PIERRE QUADRAT
Michael Mc Gettrick
Stéphane Gaubert
Publication venue
Publication date: 01/01/1998
Field of study

We describe the specialization to max-plus algebra of Howard’s policy improvement scheme, which yields an algorithm to compute the solutions of spectral problems in the max-plus semiring. Experimentally, the algorithm shows a remarkable (almost linear) average execution time

CiteSeerX

Dynamics of Min-Max Functions

Author: Email Jhcg@hplb:hpl:hp:com
France Stéphane
Jean Cochet-terrasson
Jeremy Gunawardena
Publication venue
Publication date
Field of study

Functions F : R n ! R n which are nonexpansive in the ` 1 norm and homogeneous, F i (x 1 + h; \Delta \Delta \Delta ; xn + h) = F i (x 1 ; \Delta \Delta \Delta ; xn ) + h, (so-called topical functions) have appeared recently in the work of several authors. They include (after suitable transformation) nonnegative matrices, Leontieff substitution systems, Bellman operators of games and of Markov decisions processes, examples arising from discrete event systems (digital circuits, computer networks, etc) and the min-max functions studied in this paper. Any topical function F can be approximated by min-max functions in a way which preserves some of the dynamics of F . We attempt, therefore, to clarify the dynamics of min-max functions, with a view to developing a generalised Perron-Frobenius theory for topical functions. Our main concern is with the existence of generalised fixed points, where F (x 1 ; \Delta \Delta \Delta ; xn ) = (x 1 +h; \Delta \Delta \Delta ; xn +h), which correspon..

CiteSeerX